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ABSTRACT 

We investigate the properties of dark matter substructure in the gravitational 
lens HE 0435—1223 [z\ — 0.455) via its effects on the positions and flux ratios of 
the quadruply-imaged background quasar (z s — 1.689). We start with a smooth mass 
model, add individual, truncated isothermal clumps near the lensed images, and use 
the Bayesian evidence to compare the quality of different models. Compared with 
smooth models, models with at least one clump near image A are strongly favoured. 
The mass of this clump within its Einstein radius is log 10 (Mg in ) = 7.65_ 



1.87 
-0.84 



m units 



of h 7 Q Mq). The Bayesian evidence provides weaker support for a second clump near 
image B, with log 10 (Mg in ) = 6.55^1^ ■ We also examine models with a full population 
of substructure, and find the mass fraction in substructure at the Einstein radius to 
be / su b > 0.00077, assuming the total clump masses follow a mass function dN/dM cx 
M~ 1,9 over the range M = 1O 7 -1O 1O M0. Few-clump and population models produce 
similar Bayesian evidence values, so neither type of model is objectively favoured. 



1 INTRODUCTION 



A tension has arisen between the cold dark matter (CDM) 
paradigm and certain astronomical observations. On the the- 
ory side, N-body simulations have reached consensus that 
galaxy-scale dark matter halos should contain many bound 
subhalos that follow a power-law mass function, dN/dM oc 
M a . Probing from ~ 10 1 M Q down to ~ 10 4 M Q , simula- 
tions such as Via Lactea ( Diemand et al.||2007[ ) and Aquar- 
ius ( jSpringel et al.|[2008[ ) predict a mass function slope of 
a ss —1.9 and a fractional amount of substructure in the 
vicinity of 8-11% (depending to some extent on resolution). 

Observationally, however, the prediction of dark matter 
substructure has not been confirmed. Various surveys have 
sought to characterise the abundance, masses, and spatial 
distribution of low-mass galaxies in the Local Group (e.g., 
Simon fc Geha|2007| [Kalirai et al.|2010[ ). Before 2005, only 
the 11 most massive and luminous Milky Way satellites had 
been fou nd (|Mateo|1998|). After 2005, the Sloan Digital Sky 
Survey ( York et al. 2000 1 made it possible to detect ex- 
tremely faint satellites (e.g.,|W"illman et al.|2005||Irwin et al. 



2007 Liu et al. 2008 



2010). These 



Belokurov et al.||2009 

"ultra-faint" dwarfs, with absolute magnitudes as low as 
My ~ —2, have more than doubled the number of Milky 



Way satellites to ~ 25 (for a current list, see Wadepuhl & 
Springcl 2011). Yet, despite this dramatic leap forward, the 
number of satellites still falls severely short of the hundreds 
predicted by N-body simulations ( Klypin et al.|1999 Moore 
et al.|1999| |. 

A clear contributor to this disparity is the lack of a 
complete and thorough survey of the local volume. Indeed, 



while a huge improvement over previous surveys, the SDSS 
is limited in both sky coverage (~l/5 of the sky) and depth 
(g < 22.2). Attempts to account for these limitations sug- 
gest that a volumetrically complete survey will find many 
more satellites and may eliminate the problem altogether 
( Tollerud et al.| 2008). Those estimates depend, however, on 
extrapolations from the currently known population, and 
it is quite plausible that even a complete survey will not 
find all the predicted satellites. If so, any remaining dis- 
crepancy between theoretical predictions and observations 
will presumably be attributed to the intrinsic luminosities 
of low-mass dwarfs. Satellites with total mass < 1O 7 M can 
experience suppressed or even quenched star formation (e.g., 
Strigari et al.|2007||Macci6 et al.|2010 |. Cosmic reionisation, 



UV photo-evaporation, ram pressure or tidal stripping, su- 
pernovae, and cosmic rays may all play a role in hamper- 



ing the conditions for star formation ( 


Gnedin 


2000; Scan- 


|napieco et al.||2001| |Strigari et al.||2007 


Madau et al. 2008 



Mashchcnko et al. 2008 



|2010| |Wadepuhl fc Springel||2011[ ). While the precise mech 



anisms arc still debated, the plausibility of such arguments 
points to a large population of "dark dwarfs", whose lumi- 
nosities are so low that they will elude traditional observa- 
tional techniques. 

Intriguingly, while local observations of satellite galax- 
ies seem to fall short of CDM predictions, measurements in 
more distant galaxies exhibit the opposite conflict. Sensi- 
tive to mass alone, strong gravitational lensing provides a 
unique tool to detect low-mass subhalos in cosmologically 



distant galaxies, regardless of their luminosities (e.g., |Dalal 



& Kochanek 2002 Vegetti et al. 2010b I . On large angular 
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scales (~ 1"), the bulk properties of multiply-imaged quasars 
are determined by the macroscopic mass distribution of the 
lens galaxy and its surrounding environment. Upon detailed 
inspection, however, the properties of the images may be 
perturbed by small-scale structure in the mass distribution 
||Mao fc Schneider|1998||Chiba|2002||Metcalf fc Madau|200l[ 
Dalai fc Kochanek|2002[|Metcalf fc Zhao|2002| |Bradac et all 
200 2[ |Koopmans et al.||2 0021 |Chen et al.||2007| |Keeton fc 



Moustakas 



2009 



Keeton 



20091. Thus, with the positions, 



flux ratios, and time delays of lensed images we may be able 
to measure the properties of small-scale structure in lens 
galaxies. 

Currently, some of the best constraints on dark matter 
substructure (outside of the Local Group) come from the 
analysis of "anomalous" flux ratios in four-image gravita- 
tional lenses. Many lenses have flux ratios that violate uni- 



versal relations predicted for smooth mass models (Keeton 
et al.[ 2003, 20051. Performing a statistical analysis of seven 



lenses, Dalai & Kochanek ( 2002 I found the mass fraction in 



substructure to be 0.006 < / su b < 0.07 (90% confidence) at 
the Einstein radii of the lenses. This stands in contrast to 
CDM predictions, which yield / au b ~ 0.001-0.003 at sim- 
ilar projected radii (|Mao et al.||2004| |Amara et al"1|2006| 



Maccio et al. 



2006 Maccio & Mirandal 120061). In particu- 



lar, Xu et al. (20101 recently found that N-body simulations 



predict / su b ~ 0.002 at typical Einstein radii even when con- 
sidering other sources of small-scale structure beyond dark 
matter substructure (e.g., globular clusters, stellar streams). 

Observational constraints, therefore, seem at odds. Tal- 
lies of Milky Way satellites seem to indicate a dearth of 
substructure, while lensing points to a surplus. Confronting 
this on the lensing side, there is great interest in expand- 
ing both the list of observables and the sample of lenses 
used to probe substructure. For example, infrared observa- 
tions of lenses have begun to increase the number of quasar 
lenses available for flux ratio studies (e.g., Chiba et al.|2005 



ton 



2011|) . Image positions (|Chen et al. [2007 1 and time 



delays ( Keeton & Moustakas 2009 1 can complement flux 



MacLeod et~1dT1|2T)09"l |Minezaki et al.||20C)9| |Fadely fc Kee-| 



ratios by providing different sensitivity to substructure in 
quasar lenses. Also, Einstein ring images offer a new way to 



probe substructure in galaxy-galaxy strong lenses ( Vegetti & 
Koopmans 2009b; Veget ti et al.|2010a|b[ ). In particular, |Veg- 



etti et al.| ( |2010b[) recently used a Bayesian analysis to infer 
/sub = 0.0215+o:oi25 m the lens SDSS J0946+1006, assum- 
ing a = —1.9 ± 0.1 for a mass range from Af tota i = 10 6 ' 6 M@ 
to 1O 9 - 6 M . 

In this paper we investigate the properties of the four 
image gravitational lens HE 0435-1223 (hereafter HE0435), 
selected for its relatively bright (F160W < 18.1) and well 
separated (2.4") images (see Fig. [l]). Since its discovery 
(Wisotzki et al. 1120021), HE0435 has been extensively stud- 



ied using ground- and space-based observations. From the 
ground, optical spectroscopy provided early evidence for 
stellar microlensing and against significant differential dust 
extinction in the lens (Wisotzki et al. 20031. More re- 



cently, optical monitoring has quantified the intrinsic and 
microlensing variability, and also revealed the time delays 
between images ( Kochanek et al.|2006 Courbin et al.|20To I . 
Hubble Space Telescope imaging provided photometric evi- 



dence that the lens lies in an overdense environment (Mor- 



al 

M - 1 



-3 



I J I I I I I I I I I | I I I I I I I I I | I I I I I I I I I J I I I I I I I I I | I I I I I I I I I | I 




-2-10 1 2 3 

X (arcsec) 

Figure 1. Schematic diagram of the lens HE 0435-1223. The size 
of ellipses depicting the main lens Gl and nearby galaxy G22 are 
set to the effective radii measured in HST images. For G22, no 
measurement of the ellipticity is given in the literature. Dotted 
boxes present the spatial regions used as priors for the clump 
positions in Section [5] after an initial MCMC exploration. 



confirmed the presence of a group of galaxies surrounding 



the lens (Wong et al. 20111. A galaxy lying near the lens 



on the sky (labeled G22 by |Morgan et al. 2005|) seems to 



be important for reproducing the lensed images (Kochanek 



et al. 



2006 1 . Using the available data, including new near- 

we examine 



infrared photometry (Fadely & Keeton 20111 



the mass distribution of HE0435 and pay particular atten- 
tion to any evidence for substructure. New in our analysis is 
the use of both individual-clump and population-based sim- 
ulations of substructure, which allow us both to constrain 
the masses of clumps near the images and to connect them 
to the broader substructure population. Not included in our 
analysis are effects from small mass halos along the line of 
sight. While such structures may produce millelensing ef- 
fects similar to subhalos within lens galaxies, their ultimate 



importance is still debated (e.g., Chen et al. 2003 Metcalf 
|2005[ ). Where necessary we assume a flat cosmology with 
f2m = 0.27 and Ho = 70.4kms _1 Mpc -1 , which is similar to 
the mean WMAP+BAO+i/o values presented in |Komatsu| 
el^pOlll. 



2 CONSTRAINTS 

Out of the previous observations of HE0435 we must select 
the data we seek to fit. The chosen data should provide valu- 
able constraints on the lens mass distribution, be practical 
to use, and permit a straightforward interpretation. The op- 
timal astrometric data are the HST-derived centroids of the 



lensed images and the main lens galaxy, Gl (Kochanek et al 



2006 1, along with the position of the neighboring galaxy, G22 
(|Morgan et al.|2005[ ). The redshift of G22 is not known (see 
Wong et al. 20111; we assume this galaxy lies at the same 



gan et al. 20051, and pencil-beam redshift surveys have redshift as Gl. The data are summarised in Table [T] 
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Images 





Position (") 


i?-band flux 


L'-band flux 


L'-band flux 


Image A 
Image B 
Image C 
Image D 


-1.165 ± 0.003 
0.311 ± 0.004 
1.302 ± 0.005 - 

-0.226 ±0.003 - 


0.573 ±0.003 
1.126 ±0.004 
-0.030 ±0.005 
-1.041 ± 0.003 


1.751 ± 0.098 
0.998 ±0.037 

= 1.0 
0.851 ± 0.049 


1.837 ± 0.086 
1.271 ± 0.063 

= 1.0 
0.745 ± 0.049 


1.706 ±0.085 
0.991 ± 0.065 

= 1.0 
0.809 ± 0.090 


Galaxies 




Position 


(") 


F555W (mag) 


F814W (mag) 


F160W (mag) 


Gl 
G22 


= 0.0 ± 0.002 
2.585 ±0.005 - 


= 0.0 ±0.002 
-3.637 ±0.005 


21.55 ±0.13 
22.25 ±0.04 


18.85 ±0.13 
21.26 ±0.01 


16.86 ±0.04 
~18.8 



Table 1. HE0435 constraints. The positions and H-band photometry of the images are from [Kochanek et al.| l |2006| . The R-band 
flux ratios reflect the mean and standard deviation from light curve monitoring, and include scatter from intrinsic and microlensing 
variability. The K and L'-band photometry of the images are from|Fadely &; Keeton| j201l} . The data for the lens galaxy Gl, and the 
F160W magnitude of the neighbor galaxy G22, are from Kochanek ct al. (2006), while the remaining data for G22 are from [Morgan] 
|et al.|j2005) . 



More care must be given to the photometric data. Sev- 



eral datasets are available: Kochanek et al. (20061 present 



L'-band monitoring, Mosquera et al. (2011 1 present photom- 



etry in one broa d-band and six narrow -band filters spanning 
~3500-8100 A, IWisotzki et all (20031 present optical inte- 



gral field spectroscopy, and Fadely & Keeton (2011 \ present 
photometry in the near-infrared K and L bands. Figure [2] 
shows the main dependences on time and wavelength using 



the optical monitoring of Kochanek et al. ( 2006 1 and the 



NIR flux ratios of Fade ly fc Keeton| ( |201lj ). Three key fea- 
tures are seen in these data. There is clear time variability 
in the i?-band flux ratios. All three L'-band flux ratios are 
consistent with the mean values of the i?-band flux ratios. 
Two of the i\-band flux ratios are likewise consistent with 
the other wavelengths, but the 7\-band value of the B/C 
flux ratio is a factor of ~1.3 higher than the corresponding 
R- and L'-band values. One other key result, from analy- 



sis of spectra by Wisotzki et al. (20031, is that there is no 



evidence of dust extinction in the lens galaxy. 

The subtlety here is that the measured flux ratios may 
be affected by stellar microlensing, but we would prefer to 
omit microlensing from our analysis to the extent possible 
because it adds considerable computational expense and dis- 
tracts from our focus on dark matter substructure. Thus, we 
need to understand whether it is possible to account for or 
even eliminate microlensing from the flux ratio constraints. 
One simple possibility is to broaden the errorbars so they 
encompass microlensing effects. We do that by computing 
the standard deviation across all epochs in the i?-band light 



curves from Kochanek et al. (2006). This incorporates all 



microlensing and intrinsic variability that occurred during 
the 2-year span of the light curves. 

It does not, however, account for microlensing effects 
with time scales longer than ~2 yr. To take the analysis one 
step further, we consider the multi-wavelength structure of 
the source quasar. For a source redshift of z a = 1.689, the 
R- and 7\-band observations probe rest-frame UV and op- 
tical wavelengths that are dominated by thermal emission 
from the hot q uasar accretion disk, which is small enough 
(~ 10 15-17 cm; Morgan et al. 20101 to be sensitive to mi- 
crolensing. By contrast, the L -band observations (rest frame 
1.4 (jm) should contain emission from both the accretion 



2.5 



2.0 



0.5 



/ h 




2500 2750 3000 3250 4750 5000 

HJD-2450000 (days) 

Figure 2. Flux ratios of images A, B, and D, relative to image 
C, as a function of observation epoch. (Section |5.1| explains why 
we take flux ratios relative to image C.) The B/C flux ratios 
are offset by 0.2 for visual clarity. Small circular points show R- 
band monitoring from |Kochanek et al| | |2006| l , while square and 
triangular points show single-epoch K- and L'-band data from 
|Fadely &: K eeton (201^1. Solid and dotted lines indicate the mean 
and 68% confidence ranges across all epochs in the K-band data. 



disk and the surrounding dust torus ( Rowan- Robinson| 1995 



Nenkova et al.|[2008j ). The relative contributions of the two 



components are not known exactly, but the dust torus prob- 



ably accounts for 20-80% of the flux (e.g., Wittkowski et al. 
|2004| |Honig et alT]|2008[ ). Since the dust torus should be 
large enough to be immune to microlensing, its contribution 
should cause the L' flux ratios to have little or no variability 
from microlensing. We therefore interpret the similarity be- 
tween the L'- and ii-band flux ratios as evidence that there 
is no significant long-term microlensing affecting the J?-band 
light curves. 

In other words, we can take either the L'-band mea- 
surements or the i?-band measurements (with broadened 
errorbars) as microlensing-free estimates of the flux ratios. 
In some sense the choice is not very important because the 
two sets of measurements are consistent with each other. In 
practice, it is easier to work with the J?-band data because 
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at these wavelengths the source is much smaller than the 
Einstein radius of mass clumps larger than ~ 100 Mq, so we 
can effectively treat it as a simple point source in our study 



(Evidence) 



Significance 



of lcnsing by dark matter substructure (cf. Dobler & Keeton 



2006). 



In the context of this analysis we still need to under- 
stand why the A"-band measurement of the B/C flux ratio 
differs from the other measurements. From Figure [2j the 
B/C flux ratio must vary significantly with time and/or 
wavelength. Dark matter subhalos can be ruled out as the 
cause because the agreement between the L' and R flux ra- 
tios suggests that there is little or no "chromatic millilens- 
ing" in HE0435 (see |Dobler fc Keeton|[2006| . Microlensing 
may be a viable explanation, though, if the Einstein radii of 
stars in the lens galaxy are comparable to or larger than the 
size of the _K"-band source. We examine this hypothesis care- 
fully in Section [7.1| considering not only how microlensing 
might explain the if -band data but also whether it could 
have altered the L'-band data as well. To jump ahead, we 
conclude that microlensing can indeed explain the if-band 
data without ruining our interpretation that the L'-band 
data provide good estimates of the microlensing-free flux 
ratios. 

In our primary modeling we elect not to use measured 
time delays as constraints. At the time of our analysis, 



Kochanek et al. (2006| had published time delays based on 
two seasons of monitoring, but it was not clear how well the 
quasar and microlensing variability had been disentangled. 



Indeed, Blackburne & Kochanek ( 2010 1 reported newer time 



delay estimates differed by 2-5cr from the previous results. 
After our analysis was complete, |Courbin et al.| ( |2010[ ) pre- 
sented new data for HE0435 including refined astrometry 
from deconvolution of HST images and time delays from 
four additional years of _R-band monitoring. We compare 
the new time delays to our lens mode ls in Section |7.2| One 
valuable by-product of the analysis by Courbin et al. (20101 
is estimates of the i?-band flux ratios after correcting for 
both microlensing and intrinsic variability in the source. We 
note that these "corrected" f?-band flux ratios match within 
~ la the mean _R-band values used here. 



3 METHODOLOGY 

We use a Bayesian framework both to constrain model pa- 
rameters and to assess the quality of different models. We 
aim to compute the posterior probability distribution 



P(0\d,M) 



P(d\8,M)P(6\M) 
P(d\M) 



(1) 



where d is the data which constrain the parameters for 
model M[]We calculate the likelihood, C = P(d\0, M), from 
the x 2 goodness-of-fit: C oc e~ x / 2 . Since we are only con- 
cerned with relative posterior probabilities, we ignore the 

— 2 /9 

proportionality constant and set C = e x ' .In most cases 
we take the prior distribution, P(6\M), to be uniform for 
the parameters listed in Table [3] one exception is discussed 
in Section RT21 



1 Note that d and 8 can be vectors, but we omit vector notation 
here for simplicity. 



0-0.5 Barely worth mentioning 
0.5-1.0 Substantial 
1.0-1.5 Strong 
1.5—2.0 Very strong 

> 2.0 Decisive 



Table 2. Jeffreys' scale l|Jeffreys||1961|l for grading the signifi- 
cance associated with different ranges of the (logarithmic) Baycs 
factor. 



The denominator in eqn. |T]) is the marginal likelihood 
of the model, also known as the Bayesian Evidence: 



Evidence(M) = P(d\M) = / P(d\9, M) P(6\M) d0 



(2) 



In many astrophysical studies there is only one model be- 
ing examined. In that case the Bayesian evidence can be 
ignored, since the normalisation of the posterior does not 
affect confidence intervals for marginalised parameters. If 
the evidence is not needed, an effective way to proceed is to 
sample from the posterior distribution using methods such 
as Monte Carlo Markov Chains (MCMC). 

The evidence becomes crucial, though, when compar- 
ing different models. Since the Bayesian evidence quantifies 
the overall probability of a particular model, it provides an 
objective way to compare models even if they have different 
numbers of parameters ( MacKay 2003; Gcl man et al.|2003 |. 
The ratio of the posterior probabilities for two models Mi 
and M 2 is 



(3) 



P(M 2 \d) _ P[d\M 2 ) P(M 2 ) 
P(Afijd) ~ P(d\Mi) P(Mi) 

We assume equal prior probabilities for all models, P{M\) = 
P(M 2 ), so the ratio of posterior probabilities is just the ra- 
tio of evidences, which is called the Bayes factor. While the 
principle is well established, the quantitative significance 
of Bayes factors is not completely clear cut, and various 
scales are employed to facilitate the interpretation. The most 
common choice is the Jeffreys' scale ( Jeffreys]|1961 1, which 
grades Bayes factors as shown in Table [2] In this work, we 
use the Jeffreys' scale as a guideline for judging our mod- 
els, and we actually work with the differential log evidence, 
A log 10 (Evidence) = log 10 (Bayes factor). 

The practical challenge lies in integrating over all model 
parameters to compute the Bayesian evidence. We per- 
form the integration using the Nested Sampling algorithm 
(Skilling 2004, 2006), which provides marginalised parame- 
ter ranges in addition to the evidence. As a computational 
tool, nested sampling has been used in a variety of astro- 



physical studies (e.g., Mukherjee et al. 2006 


Humphrey et al. 


2009), including gravitational lensing (e.g., 


Vegetti & Koop- 


mans||2009a Barnabe et al.|2009l. 



Roughly speaking, the idea of nested sampling is to exe- 
cute many random draws from the parameter space accord- 
ing to the following scheme: at each step, the next point 
is drawn uniformly from the prior distribution but limited 
to the region where the likelihood increases. Various pro- 
cedures for doing this constrained sampling have been in- 



troduced (|Mukherjee et al. 


2006 Shaw et al. 2007 


& Hobson 2008| |Feroz et al. 


2009 1 ; it is also possible 



Feroz 
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Parameter MCMC prior Nested Sampling prior 
Minimal, smooth model 



iQgloOW") 


— oo 


oo 


0.02 


0.13 


3>G1 


— oo 


oo 


-0.003" 


0.003" 


VG1 


— oo 


oo 


-0.003" 


0.003" 


e c,Gl 


-1.0 


1.0 


-0.50 


0.50 


e s ,Gl 


-1.0 


1.0 


-0.50 


0.50 


7c 


-1.0 


1.0 


-0.04 


0.06 


7s 


-1.0 


1.0 


—0.03 


0.03 


®G1 


0.00" 


oo 


0.00" 


0.02" 


PG1 


— oo 


oo 


0.95 


1.60 


lo 61(n G22/ ! 


-1.7 


oo 


— 1.70 


—0.12 


■EG22 


— oo 


oo 


2.572" 


2.597" 


VG22 


— oo 


oo 


-3.625" 


— 3.650" 


e c,G22 


-1.0 


1.0 


— 0.70 


0.70 


^s,G22 


-1.0 


1.0 


—0.70 


0.70 




Clump models 






l°glo(W) 


— oo 


oo 


—4.00 


— 1.00 


X A 


— oo 


oo 


-1.40" 


-0.70" 


VA 


— oo 


oo 


0.40" 


0.80" 


logi (6 B /") 


— oo 


oo 


-4.00 


-1.00 




— oo 


oo 


0.20" 


0.80" 


2/B 


— oo 


oo 


1.00" 


1.50" 




— oo 


oo 


-4.00 


-1.00 




— oo 


oo 


-0.40" 


0.50" 


2/D 


— oo 


oo 


1.00" 


1.50" 



Table 3. Model parameters and priors for our smooth and clump 
models discussed in Sections [4] and [5] respectively. We adopt uni- 
form priors within the specified intervals, with the exception of 
6c,G22 and e s G22 for certain models as discussed in Section |6.2| 
Note that e c = e cos 28 e and e s = e sin 28 e are the quasi-Cartesian 
components of the ellipticity e = 1 — q. 



nested sampling without such a strict one-way progression 
( Brewer et al.|20 09). The volumes enclosed by different iso- 
likelihood surfaces may not be known exactly, but they can 
be estimated statistically, so the likelihood values and as- 
sociated volumes can be combined to estimate the Bayesian 
evidence (for details, see Skilling 2004, 2006). Because nested 
sampling is intrinsically stochastic, there is some statistical 
uncertainty in the evidence value, but we can compute the 
uncertainty using the methods presented by |Keeton| ( |201l] ). 

In general we do not wish to place strong priors on our 
model parameters, so we have a large parameter volume to 
explore. To alleviate the computational burden, we adopt a 
two-step approach to sampling. First, we execute an MCMC 
sampling of the posterior using uniform priors defined in 
Table [3] Details of our MCMC algorithm, techniques, and 
convergence criteria are discussed in Section 3.4 of |Fadely| 
eTaLlp0T0| . We use the posterior from MCMC to construct 
narrower priors that encompass the 99.999%' CL parameter 
ranges (Table [3] dotted lines Figure [I]). Using the narrower 
priors for nested sampling reduces the amount of time spent 
in regions of extremely low likelihood (x 2 > 10 6 ). Tests with 
multivariate Gaussian distributions indicate that truncating 
such low-likelihood regions of the parameter space does not 
significantly alter estimates of the Bayesian evidence. 



4 SMOOTH MODELS 

As a first step we examine how well a smooth mass distri- 
bution (without substructure) can fit the observed image 
positions and flux ratios for HE0435. Following Ko chanek| 
et al. ( 2006[ ), we adopt a "minimal" lens model in which the 
main lens galaxy (Gl) has an ellipsoidal mass distribution 
with a softened power law density profile, 



«(0 = 



,2-/3 
°G1 

2 (s 2 +C 2U " 



0/2 



(4) 



where s is the core radius, £ = \J x 2 + y 2 /q 2 is the ellipse 
coordinate (in the major axis frame), and q ^ 1 is the pro- 
jected axis ratio. The power law index is defined such that 
the mass enclosed within radius R scales as M(R) oc B? , so 
an isothermal profile has /3 = 1, while a steeper (shallower) 
profile has j3 < 1 (/3 > 1). Note that for a pure power law 
profile, the corresponding 3D density profile is p oc r' 3-3 . 
The normalisation parameter b has dimensions of length, 
and for a pure power law b is directly proportional to the 
Einstein radius: -REin = bF{j3,q). The proportionality factor 
F has a complicated form in terms of special functions, but 
if we make a Taylor expansion in the ellipticity e = 1 — q we 
can write 



F = fi~t 



1-5 
2 



16 



(5) 



We vary all model parameters for the main lens, including 
P. IKochanek et al.l (|2006h found that HE0435 has a density 



profile that is shallower than isothermal, which leads to a 
rising rotation curve. Varying /3 is important if we are to 
account for a wide range of possible mass distributions. 

Our minimal model also includes the effects of the en- 
vironment of HE0435. The neighboring galaxy G22 is close 
enough (only 4.4" from Gl) that it needs to be included 
explicitly. As in previous studies, we assume that G22 lies 
at the same redshift as Gl. 



Kochanek et al. 



(2006) modeled 
G22 as a singular isothermal sphere and found a best-fit 
Einstein radius of 0.22", so the galaxy should provide neg- 
ligible surface mass density at the location of HE0435, and 
an isothermal profile should be adequate. We do, however, 
generalise by letting G22 be elliptical and use a singular 
isothermal ellipsoid model. We also add an independent ex- 
ternal shear to account for tidal effects from the group of 



galaxies surrounding the lens (Morgan et al. 
et al.|2011 l. 



2005 Wong 



We use an updated version of the public lensmodel code 
( Keeton|2001[ ) both to find the best-fit smooth model and to 
run nested sampling. The resulting Bayesian evidence is re- 
ported in Table|4]below. The best-fit model has \ 2 = 24.6 for 
Ndot = — 1- Since the model is formally underconstrained, 
finding \ 2 7^ indicates that the model is not sufficiently 
flexible — it lacks some key freedom. To diagnose the failure, 
we show in Figure[5]the distributions of flux ratios predicted 
by the smooth model, compared with the observed values]^] 
The smooth model is unable to account for the observed 
AjC flux ratio at high confidence. This constitutes a clear 
"flux ratio anomaly" of the sort that has been seen in other 

2 Strictly speaking, if the fluxes are normally distributed then the 
flux ratios follow Lorentzian distributions. For practical purposes, 
though, we treat the flux ratios using Gaussian distributions. 
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Figure 3. Solid lines show marginalised probability distributions for the ij-band flux ratios inferred from our smooth, minimal mass 
model. Dotted lines show the likelihood functions for the observed ij-band distributions (Gaussians with the measured mean and variance). 
All curves are normalised to have a peak of unity. The minimal model can match the observed B/C and D/C flux ratios, but cannot 
match A/C. 



lenses (e.g.,|Mao fe Schneider|1998||Bradac et al.|2002||Met- 



calf fc Zhao|2002| |Keeton et al.|2003||2005| > and interpreted 
as evidence for dark matter substructure (e.g., Metcalf & 
Madau|200l] |Dalal fc Kochanek|2002| ). 



5 FEW-CLUMP MODELS 

Motivated by the anomaly, we hypothesise that HE0435 con- 
tains substructure and examine how well we can test that 
hypothesis and constrain the properties of the substructure. 
In this section we consider whether it is possible to explain 
the data with just one or a few clumps that (presumably) 
lie near the lensed images. In Section [6] we study full popu- 
lations of substructure. 



5.1 Approach 

We first need to reflect on where we might expect mass 
clump(s) to be. Since the observed A/C flux ratio is higher 
than predicted by smooth models, we need substructure 
to increase the predicted ratio, which means making im- 
age A brighter or C fainter. In HE0435, A and C are both 
positive-parity images. Substructure tends to make positive- 
parity images brighter and negative-parity images fainter 
( Schechter & Wambsganss 2002; Keeton 2003 1, so we expect 



to need a clump near image A. We do not place a clump near 
image C, because that would tend to make C brighter and 
exacerbate the flux ratio anomaly. (This is the reason we 
have chosen to compute flux ratios relative to image C.) 

We imagine that there might be additional clumps near 
images B and/or D. To be specific, we consider a model 
with one clump near image A, a model with two clumps 
near A and B, a different model with two clumps near A 
and D, and a model with three clumps near A, B, and D. 
For simplicity, we label these four models A, AB, AD, and 
ABD, respectively. The models clearly have different num- 
bers of parameters, but the Bayesian analysis can provide 
an objective ranking of the models (through the evidence) 
along with constraints on the clump positions and masses 
(through parameter marginalisation) . 

We model the clumps with a spherical pseudo-Jaffe pro- 
file, which has a three-dimensional density p(r) oc l/r 2 (a 2 + 



sity of the form 



«(r) = 



^clump 



l 



\/a 2 + r 2 



(6) 



where fe c ium P sets the mass scale while o represents a trun- 
cation radius. The total mass for this model is M tota i = 
7rE cr it6 c i ump a. Pseudo-Jaffe clumps are efficient lenses be- 
cause they have a steep, isothermal slope inside the trun- 
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Figure 4. Similar to the top left panel of Figure [3] but for 
our model with a mass clump near image A (solid, blue) and 
our model with three clumps near images A, B, and D (dashed, 
red). Adding a clump near image A clearly brings the models into 
agreement with the data. Adding clumps near images B and D 
has little effect on the predicted A/C flux ratio, indicating that 
constraints on clump A are fairly independent of the presence of 
clumps near other images. 




-1.2 -1.0 
X (arcsec) 



0.6 



Figure 5. 95% confidence constraints on the position of the 
clump near image A (marginalised over all model parameters). 
The circle indicates the position of image A. Dotted, dashed, 
and solid contours show results for clumps with masses less than 
10 6 , 10 7 , and 10 s Mq, respectively. More massive clumps can lie 
farther from image A and still reproduce the observed flux ratio. 



cation radius; clumps with shallower profiles (e.g., NFW) 
are less efficient lenses and could therefore lead to different 
model results. We select the pseudo-Jaffe model because it 
includes the effects of tidal truncation, and its role in earlier 
studies (e.g., |Dalal fc Kochanek||2002| |Vegetti et al.||2010a l 
facilitates the comparison of previous results with our re- 
sults using new methodology. We defer a systematic study 
of clump density pr ofiles to follow-up work. Following |Dalal| 



& Kochanek 



( |2002| , we set a = ^(b G i)b 

clump, j 



to account 



for tidal truncation of a pseudo-Jaffe profile by the parent 
halo in an approximate but reasonable way. Here (&gi) is 



the average mass normalisation of Gl and b c i un 



is the 



maximum mass scale parameter for the clump. For HE0435, 
this works out to be a — 0.367". 

Table [3] provides a complete list of the parameters for 
our few-clump models. We vary all of the parameters using 
nested sampling to obtain both evidence values and parame- 
ter constraints. We then use the constrained clump parame- 
ters to compute the clump masses. We can compute the total 
mass of the pseudo-Jaffe model, but we expect the quantity 
that is more relevant for lensing (especially flux ratios) to be 
the mass within the Einstein radius. We quote both but give 
particular attention to the mass within the Einstein radius. 



5.2 Results 

We first add a single clump near image A to our macromodel. 
Table [4] gives the Bayesian evidence along with constraints 
on the clump parameters. Figure [4] shows that adding the 
clump does much to alleviate the discrepancy between the 
predicted and observed flux ratios: the predicted value is 
now A/C = 1.72^0'Jo. In fact, the model is able to repro- 
duce the data perfectly, with a best-fit value of % 2 = 0. In 
some sense this is not surprising because the model is under- 
constrained with A^dof = —4, but we saw before that being 
underconstrained does not guarantee a perfect fit. Appar- 



ently the clump provides some important freedom that was 
not present in the smooth, minimal model. 

Figure [5] shows constraints on the position of the clump 
near image A for different mass ranges. We find that the 
clump position and mass are degenerate in the sense that a 
more massive clump can lie farther from the image and still 
reproduce the observed flux ratio. This is familiar from pre- 
vious studies (e.g., |Dalal fc Kochanekp)02"] |Keeton||2009| l, 
and not surprising because, heuristically, flux perturbations 
are driven by shear perturbations of the form ^7 oc M/d 2 , 
where d is the distance of the clump from the image (see 
Mao & Schneider 1998]) . In principle, a star placed close to 
the image can produce the same magnification as a more 
massive clump placed farther away (provided the source is 
sufficiently small). Adding position constraints can break the 
degeneracy, though. Position perturbations are driven by de- 
flection perturbations of the form 8a oc M/d ( |Chen et al. 
20071 |Keeton||2009l). Thus, if both the position and flux are 



affected by a clump, the different scalings make it possible 
to constrain the clump mass. On the one hand, a very low- 
mass clump is simply unable to affect the image position, 
regardless of its location; on the other hand, a high-mass 
clump may disturb the image position too much, and may 
affect other images as well. In HE0435 these effects let us find 
bounds on the mass of clump A: log 10 (ME in ) = 7.65^q gl ( or 
log 10 (Af t otai) = 9.31±o:42)0 We conclude that the clump is 
well constrained by the combination of both flux and posi- 
tion data. 

We next consider models with more than one clump 
near the images. We keep the clump near image A because 
it seems essential, but try placing additional clumps near B 
and/or D. Table [I] gives the evidence values and parame- 
ter constraints for the various models. Figure [4] shows that 

3 As noted in Section [5.l| we emphasise the mass within the Ein- 
stein radius because we expect it to be the more robust quantity. 
The total mass is more sensitive to the clump profile and trunca- 
tion radius. 
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adding more clumps does not significantly alter the pre- 
dicted A/C flux ratio. This, in turn, implies that additional 
clumps have little effect on the inferred properties of clump 
A (see Table gj. 

We can assess the relative probabilities of the various 
models using the Bayesian evidence values in Table [4] The 
models with clumps all have evidence values that are at 
least three orders of magnitude greater than our minimal 
smooth model. Clearly, the data strongly prefer models with 
at least one clump near the lensed images. According to the 
Jeffreys' scale (Table the case for at least one clump is 
decisive for the range of models considered here. Examin- 
ing the evidences in detail, we see that model AB has the 
highest evidence, with a value that is 0.63 dex higher than 
for the single-clump model. Formally, this provides "sub- 
stantial" evidence for a second clump near image B with 
masslog 10 (A/# in ) = 6.55±i;j£ (or log 10 (M t * J = 8.76t°;?°). 
Since the evidence values carry uncertainties of 0.16 dex, the 
case for clump B is intriguing but far from decisive. 

It is striking to see that adding a clump near image 
D has essentially no effect on the Bayesian evidence: the 
evidences for the A and AD models are indistinguishable 
(given the uncertainties), and likewise for the AB and ABD 
models. Apparently the parameters associated with clump D 
do not significantly improve the models' ability to reproduce 
the data, so they produce little change to the evidence. This 
is an example of Occam's Razor in action. 

It is interesting to study how the addition of clumps af- 
fects the parameters of the smooth mass distribution. Figure 
[6] shows joint posterior probability distributions for several 
key parameters, before and after adding a clump near image 
A. In general, adding the clump broadens the distributions, 
which is not surprising because of the increased flexibility 
afforded by the clump. For some parameters the posterior 
also develops structure indicative of covariances in the 14- 
dimensional parameter space (see the right-hand panel of 
Fig. Rjl. Even so, the median values of the distributions are 
not significantly altered, typically shifting within the 68% 
confidence intervals of the no-clump model. 

One notable parameter is the density slope of the main 
lens, j3. Using the quasar image positions and (estimated) 
time delays, the Einstein ring image of the quasar host 
galaxy, and a prior on Ho, Kochanek et al.] (|2006|) found 



the slope to be shallower than isothermal, corresponding 
to P > 1.0 in our models. We obtain /3 = 1.19±o.ii and 
1 -19to' 15 for our models without and with clump A, respec- 
tively. Thus, we find evidence for a shallow density profile 
independent of time delay constraints, and that result is not 
affected by the presence of substructure. 

The mass constraints we find on substructure in HE0435 
are a first for quasar lenses. Previous work in the radio and 
mid-infrared has provided good evidence for substructure 
but has not necessarily yielded upper and lower bounds on 
clump masses (e.g., Chiba et al.|200 5 Minczaki e t al.|2009 |. 
Presumably the constraints from flux ratios and image posi- 
tions in some lenses are not (yet) strong enough to determine 
clump masses. More recently, studies of the lens systems 



B2045+265 (McKean et al. 20071, MG 2016+112 (More 



et al. 20091, and H1413+117 (MacLeod et al. 20091 have 



been able to place specific constraints on clump masses. In 
all of those cases, however, the substructure was linked to a 
luminous satellite whose position could be constrained from 



direct observations. Fixing the position breaks the position- 
mass degeneracy that is inherent in flux perturbations, and 
thus can lead to very good constraints on mass (<7m ~ 0.1- 
0.3 dex). Our results for HE0435 are novel in two ways. 
First, we have been able to constrain clump positions and 
masses from lens data alone, with no direct observations 
of the clump (s). This shows that it is possible to constrain 
clumps even if they are invisible. Second, the masses we have 
found for clumps A and B are among the smallest found in 
any lens system to date|^] 

While we believe these conclusions to be new and in- 
teresting, we do offer one cautionary note. Our clump con- 
straints have been derived in the context of a fairly simple 
macromodel. Adding more flexibility to the macromodel, 
such as higher-order multipole modes or pixellated poten- 
tial perturbations (e.g.,|Evans fc Witt|2003||Yoo et al.|2005 



2006 Blandford et al. 2001 Koopmans 20051, would pre 



sumably alter the inferred clump constraints (e.g., broaden 
the mass uncertainties). However, Congdon & Keeton ( 2005 1 
found that such features alone cannot explain flux ratio 
anomalies, and |Yoo et ak] ( |2005| |2006| ) found that ellipti- 
cal symmetry seems to be a reasonable assumption for lens 
galaxies. We therefore expect that adding reasonable flexi- 
bility might weaken but not eliminate the evidence for sub- 
structure in HE0435. 



6 SUBSTRUCTURE POPULATION MODELS 

So far we have concentrated on a few individual clumps near 
the lensed images of HE0435. Those clumps are presumably 
not the only examples of substructure in the system, but 
rather special representatives of a larger population (special 
in that they lie near an image). In this section we aim to 
constrain a full substructure population of the sort predicted 
by CDM (e.g., |Diemand et al.|2007| |Springel et al.|2008 |. 



6.1 Model 

We assume the clump population is characterised by a power 
law mass function of the form dN/dM oc M a with a = —1.9 
( Diemand et al.||2007| |Springel et al.||2008| ) and fixed lower 
and upper total mass thresholds of M to tai = 10 7 Mq and 
1O 1O M0, respectively. We assume the clump positions are 
drawn from a uniform spatial distribution out to 10" from 
the centre of the lens. While realistic substructure may not 
be spatially uniform (e.g., Zentner et al. 2005 Springel et al 



|2008||Nierenberg et alJ]2011| ), using a uniform distribution 
facilitates comparison with previous work (e.g., Dalai & 



Kochanek 2002). Moreover, the choice of spatial distribution 



should not be terribly important for our results, because we 
focus on observables (image positions and flux ratios) that 
are mainly sensitive to clumps in the vicinity of the Einstein 
radius (e.g., |Rozo et al.|2006| |Keeton|2009| . 



We characterise the abundance of substructure using 
the mean convergence, k s = E s /E cr it, where S s is the mean 

4 As noted in Section |5.l| the model results — including the in- 
ferred clump masses — may depend on the choice of clump density 
profile. However, our use of the same clump profile as in previous 
work means that it is fair to compare clump masses from different 
studies. 
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Table 4. Marginalised clump parameters and evidence values for models which add the specified clump(s) to our minimal, smooth 
model. We quote differential log evidence values relative to the smooth model to facilitate model comparison (see Section [3] and Table 
2b. Positions are given in arcsec. The clump mass is the mass within the Einstein radius, in units of h^ Mq. 
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Figure 6. Joint posterior probability distributions for two pairs of parameters: the quasi-Cartesian components of the ellipticity of Gl 
(e c = ecos2(9 e and e s = esm20 e , left panel), and the mass normalisations of the lens Gl and its neighbor G22 (right panel). The solid 
curves show the 68% and 95% confidence contours from our minimal, smooth model, while the dashed curves shows results from our 
model with a clump near image A. Adding substructure can broaden the parameter distributions (e.g., left panel) and also introduce 
covariances (e.g., right panel). 



surface mass density in substructure (averaged over many 
realisations), and S cr i t is the critical surface mass density 
for lensing. For lensing purposes it is convenient to work 
with the scaled clump mass, m = Mtotal/Scrft, which has 
units of angular area. If the number density of clumps per 
unit mass of dn/dm, then the substructure convergence is 

k s = [ m dm. (7) 
J dm 

Before undertaking extensive simulations, we would like to 
see if we can use the clumps inferred so far to estimate the 
properties of the larger population. While such an estimate 
must be taken with a grain of salt, it may help guide our 
exploration of parameter space. In Appendix A we present 
a toy model for the probability distribution P(n 3 ) based on 
the idea that a clump population should have one clump in a 
location that leads to a strong flux perturbation in image A. 
That analysis leads to an estimate of k s = 0.0251q'q22 (95% 
CL). Therefore, we consider the values of n s = [0.00022, 
0.00046, 0.001, 0.0022, 0.0046, 0.01, 0.022, 0.046, 0.10] in 
our simulations. 

For technical reasons, the clumps used in our popula- 



tion models differ from those used in our few-clump models 
in two small ways. First, instead of a smoothly truncated 
pseudo-Jaffe profile we use a sharply truncated isothermal 
profile whose density is p oc r~ 2 inside the truncation and 
zero outside. Second, the truncation radius is not fixed but 
scales with clump mass such that the ratio of the trunca- 
tion radius to the Einstein radius is fixed to a ratio of ~60. 
(When we use a wide range of clump masses, it seems more 
sensible to have the truncation radius scale with mass than 
to use a some fixed value.) While in detail these differences 
may influence the distributions of deflections and magnifi- 
cations produced by clumps, they should not significantly 
affect our results. 



6.2 Approach 

The complete set of parameters for this analysis includes the 
smooth model parameters (denoted by 0), the substructure 
convergence (k s ), and the positions and masses of individual 
clumps (denoted by c). The quantity we seek is the posterior 
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distribution for k s after marginalising over 9 and c, 
P(n a \d) = Z^l [ C{d\c, 0) P{c\k s ) P{6) dc dO (8) 



The likelihood C depends explicitly on the clump positions 
and masses and the smooth model parameters; k s enters 
only implicitly, through the number of clumps, so we have 
written it in the priors P(c\k s ). The latter factor also in- 
cludes the priors on the clump positions and masses de- 
scribed above. Finally, P(0) indicates priors on the smooth 
model parameters (see Table |3j, and Z to t is a normalisation 
factor. 

Formally the integrand in eqn. Q may have hundreds 
or thousands of dimensions, depending on the number of 
clumps, so we cannot evaluate it directly. Instead, we use 
Monte Carlo techniques. Let Cj denote one particular real- 
isation of the clump population. Suppose we generate N c 
realisations for a particular value of k s - Then heuristically 
we can let 



f(c)P(c\K s ) dc 



N, 



1 N ° 



(9) 



3=1 



We can then write the marginalised posterior for 



P(K s \d) 



Z tot N c 



j=i 



52 / c(d\ Cj ,o)P(o) de 



(10) 



Here k s is implicit on the right-hand side because it deter- 
mines the number of clumps. Let us decompose this expres- 
sion a little bit further. We can think of the integral as 
the Bayesian evidence for the macromodel, given a particu- 
lar clump realisation, so let us put 



Zj{K 8 



C(d\c h 0)P(0) dO 



We can then rewrite eqn. ( 10 1 as 



P(«.|d) = 



1 



N c 



Z tot N r _ 



52 Z 2 ( K = 



(11) 



(12) 



3=1 



In other words, the marginalised posterior for k s is the aver- 
age macromodel evidence over many clump realisations (up 
to an overall normalisation factor). 

In principle, we just need to use nested sampling to 
evaluate the integral for each of many realisations, and 
then take the average. There are two practical issues. First, 
there is some statistical uncertainty in the average due to 
having a finite number of realisations. We set N c — 5000 
in order to sample the distribution well enough to achieve a 
statistical uncertainty of ~ 20% in the evidence marginalised 
over clump realisations (verified with jackknife estimates). 

Second, in our current implementation it can take sev- 
eral hours or more to run the nested sampling for a given 
clump realisation, so it is impractical to do the full evidence 
calculation for all 9 x 5000 cases. Instead, we explore the pos- 
sibility of using the minimum % 2 value (which is much easier 
to determine) as a proxy for the evidence in each case. The 
minimum x 2 provides a measure of the peak log likelihood, 
so it may be more or less indicative of the evidence depend- 
ing on whether the width of the likelihood distribution is 
fairly regular or irregular. For a subset of clump realisations 
we find both the best \ 2 (by optimising across 0) and the 



full evidence (by integrating over G), and we compare the 
two quantities in Figure [7| 

Looking first at the case with n s = 0.001, we see distinct 
patterns in the points. At values \ 2 ?S 6.5, we find that \ 2 is 
a very poor predictor of evidence: realisations with similar 
X 2 values can have evidence values that span many orders 
of magnitude. This presumably occurs because certain real- 
isations can produce good fits only for a highly tuned set of 
macro parameters, meaning the likelihood distributions are 
narrow in and the evidences are small; whereas other real- 
isations can have a much larger range of macro parameters 
that produce reasonable fits. (See Fig.[8]for more discussion.) 
Above x 2 ~ 6.5, there is a tighter relation between x 2 an d 
evidence. The evidence decreases with x 2 U P to x 2 ~ 21, at 
which point the evidence values become consistent with the 
evidence for the smooth, minimal model. The patterns gen- 
erally persist as k s increases, although with more scatter. 
We attribute the scatter to having more clumps and thus 
a wider range of substructure perturbations, some of which 
make the model more consistent with the data but many of 
which go in the opposite direction. 

In Figure[7]we also plot the mean and scatter in the evi- 
dence values for bins of x 2 ■ We use these to create a "lookup" 
scheme to convert from x 2 1° evidence for subsequent reali- 
sations. Specifically, for each realisation we optimise across 
the macro parameters to find the best-fit x 2 ■ We interpo- 
late between bins to find the mean and scatter in the log 
evidence for that x 2 value. We then draw from the appro- 
priate log-normal distribution to assign an evidence value 
to this realisation. With this process it becomes feasible to 
complete 5000 realisations for each value of k s . 

During the course of this analysis, we initially found 
that some clump realisations led to unreasonably large best- 
fit values for the ellipticity of the neighbor galaxy G22 
(eG22 ~ 0.9). In order to prevent this, we adopted a mild 
Gaussian prior of 0.0 ± 0.2 on the two quasi-Cartesian ellip- 
ticity components. While ad hoc, this prior is tight enough 
to prevent unrealistic values for the ellipticity and to sta- 
bilise the x 2 values, yet broad enough to allow a large range 
of ellipticity. 



6.3 Results 

Figure [9] shows the cumulative distribution of x 2 values for 
different values of k s . As k s increase the distribution of x 2 
values gets broader; in other words, with more substructure 
there is a higher chance that images will be perturbed and 
the model will move away from the smooth case. The scatter 
goes in both directions: some clump realisations make the 
fit better, while others make it worse. That is in the nature 
of stochastic substructure. 

Figure [lO] shows the average evidence as a function of 
the substructure convergence. We find that models with 
k s ^ 0.001 have evidence values that are some three or- 
ders of magnitude higher than models with little or no sub- 
structure. According to the Jeffreys' scale, this is additional 
strong evidence for substructure in HE0435. Moreover, we 
find that the A log 10 (evidence) values for population models 
are similar to those for few-clump models (TableEJ, indicat- 
ing that the data are consistent with, but do not objectively 
favour, millilensing by a full population of clumps. 

We can translate the substructure convergence, « s , into 
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Figure 7. Best-fit x 2 values and evidences for a subset of our simulations (gray points). We actually plot the differential log evidence 
relative to the smooth, minimal model. The three panels show results for k s = 0.001, 0.01, and 0.10. In all cases, the best-fit x 2 is a poor 
predictor of the evidence for x 2 $ 6.5. For small values of n a , the evidence is tightly correlated with x 2 for x 2 ^ 6.5, and it is consistent 
with the smooth model value for x 2 ^ 21. As K a increases, so does the scatter in evidence values. Overplotted are the mean and scatter 
in A log 10 (Evidence) for x 2 bins, which are used to convert from x 2 to evidence for additional realisations (see text). 




Figure 9. For a given k s , we generate many substructure re- 
alisations, find the best x 2 f° r each one, and plot the cumula- 
tive distribution of the resulting x 2 values. Different colours in- 
dicate different k b values ranging from 0.00022 (black) to 0.10 
(light orange). The vertical dashed line marks the optimised x 2 
for our smooth, minimal model. As n a increases, the x 2 distri- 
bution broadens because some substructure realisations improve 
the fit while others worsen it. 



a substructure mass fraction at the Einstein radius. Our 
power law macromodels have k,q ~ j3/2 at the Einstein ra- 
dius (see, e.g., Kochanck 2002), so the local substructure 
mass fraction at the Einstein radius is / su b = fts/^o ~ 
2k s //3. As noted in Section |5.2| we find f3 ~ 1.19 with 
~14% uncertainty. Thus, given that we can decisively rule 
out values k 3 ^ 0.00046, we conclude that / su b > 0.00077 
in HE0435 at high confidence. At present we do not obtain 
an upper limit on / su b; the Bayesian evidence remains high 
for values as large as / su b ~ 0.20. We speculate that such 
high substructure mass fractions are permissible thanks to 
the flexibility and freedom available to our macro model. 

Our lower bound / su b > 0.00077 is consistent with, 
but weaker than, other lensing-based measurements. Using 
a sample of seven radio quads 



Dalai & Kochanek (20021 



found 0.006 < / su b < 0.07 at 90% confidence. In the case 
of SDSS 0946+1006, 



0.0215 



+0.0205 



Vegetti et al. 



(2010b|) found / sub = 
-1.9 ± 0.1 



confidence), assuming a : 
for the slope of the substructure mass function and total 
mass thresholds M tota i = 1O 6 6 M and 10 9 <3 M Q . Both re- 
sults point to values of / su b that are higher than the values 
found in N-body simulations (/ su b ~ 0.002-0.003; e.g., Die- 



mand et al. 2007 Springel et al. 2008 Xu et al. 20101. It 



is striking that HE0435 provides such strong evidence for 
substructure yet permits / su b values that are low and fully 
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Figure 8. Spatial distributions of clumps for two realisations with K s = 0.001 (left) and k s = 0.01 (right). Triangles mark the image 
positions, while circles indicate clumps with the circle size proportional to the clump Einstein radius. Each realisation shown here provides 
a reasonable fit to the data and has at least one clump near image A (with spatial locations similar to our few-clump models). Note, 
however, that the realisations in the top row have worse \ 2 values but much higher evidence values than the realisations in the bottom 
row. We conjecture that when there is a massive clump near the images (as in the bottom row) the smooth component is confined to a 
smaller region of parameter space, leading to a narrower posterior distribution and hence a lower evidence value. 



consistent with CDM predictions. This result might imply 
that HE0435 simply has less substructure than some other 
lenses. Alternatively, it might indicate that something about 
HE0435 makes it less effective than some other lenses at 
constraining substructure. It is important to remember that 
flux ratios are mainly sensitive to clumps near the images, so 
they directly probe a small fraction of the lens galaxy halo, 
whereas / su b describes the global substructure population. 
Regardless of whether lens galaxies have different amounts 
of substructure or simply different strengths of anomalies 
(due to the stochastic nature of substructure lensing) , it is 
clear that future studies will need to examine ensembles of 
lenses (like the seven used by |Dalal fc Koch anck 2002] but 
ideally even more) to produce strong, robust constraints on 
the global properties of dark matter substructure in galaxies. 



Since our models permit, and other lensing results im- 
ply, / su b values higher than CDM predictions, it is worth 
considering how such a discrepancy might arise and whether 
it presents a challenge to CDM. On the theory side, one 
possible concern is that the number of surviving subhalos 
in N-body simulations might be underestimated, due to the 
lack of baryons in most simulations. In general, baryons are 
expected to cause dark matter (sub)halos to contract and 
become more concentrated, and that may make subhalos 
less susceptible to tidal disruption (e.g., |Dolag et aL]|2009[ ). 
On the other hand, however, baryons also make the parent 
halo more concentrated, and that might raise the rate of 
tidal disruption (e.g., |Romano-Dfaz et al.|2010[ ). Other pos- 
sible concerns lie on the lensing side. Currently, the number 
of lenses available for studying / su b is small (of order 10) and 
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Figure 10. Differential log evidence (relative to the smooth, min- 
imal model) as a function of the substructure convergence. Models 
with k s 0.00046 exhibit evidence values similar to those with- 
out substructure. Models with k s ^ 0.001 are strongly favoured. 
According to the Jeffreys' scale (Table [2J, these results provide 
decisive evidence for substructure in HE0435. 



thus sensitive to both statistical uncertainties and selection 
effects. Lensing is generally biased toward more massive and 
concentrated galaxies, perhaps with preferential orientations 
along the line of sight (e.g., Rozo et al.||2007 Mandelbaum 



et al.||2009[ ). Furthermore, lens galaxies tend to lie in over- 



dense environments (e.g., Momcheva et al. 2006 1, and the en- 



vironment may boost the abundance of substructure ( Oguri 
|2005[ ). More work needs to be done to understand whether 
selection effects in lensing propagate into a significant bias 
in /sub- 

Beyond our quantitative constraints on / su b, there are 
two aspects of our analysis worth highlighting. First, we have 
examined both individual clump models and full substruc- 
ture population models for a given system, and sought to 
connect them. This is a first for quasar lenses. For galaxy- 
galaxy strong lenses, |Vegetti et al.| |2010b) have detected 
a clump via gravitational imaging, and used that to infer 
/sub by an analysis similar in concept to what we present in 
Appendix A. 

Another key feature of our analysis is the method used 
for studying substructure populations. Here, the only com- 



parable study is that oflDalal & Kochanek ( 2002 1. Due to the 



complexity and computational demand of the study, [Dalai fc| 
[Kochanek chose to linearly reoptimise the macromodel and 
then work with the best x 2 for each substructure realisation. 
By contrast, we have fully marginalised the macromodel and 
worked with the Bayesian evidence. We have found that the 
best x 2 value can be an unreliable tracer of the evidence, 
at least in the HE0435 system. If such behavior occurs in 
the systems and models studied by |Dalal fc Kochanek| it 
could affect the recovered / su b values. Additionally, |Dalal| 
Ifc Kochanekl assumed a uniform mass for their substructure 
population. Here, we have considered a more realistic popu- 
lation with a mass function like that seen in N-body simula- 
tions. Due to the computational demand of our simulations, 
we have so far only examined one value of a and one range 
of clump masses. Future work will explore the dependence 
of inferred / su b values on these parameters. 



7 IMPLICATIONS 

While we have focused on using image positions and flux 
ratios to probe substructure in HE0435, our models have 
additional implications that we explore in this section. 



7.1 A-band flux ratios 

Our substructure models are able to account for the ob- 
served ii-band flux ratios and, due to their similarity, the L' 
flux ratios as well. As noted in Section [5J however, the B/C 
flux ratio is a factor of 1.27 higher in A than in the other 
passbands. This anomaly is perplexing because the A'-band 
emission (rest-frame 0.82 fj,m) presumably originates from 
the quasar accretion disk, so the A and R sources should 
both be small compared with the Einstein radii of dark mat- 
ter clumps and should therefore experience similar lensing 
magnifications. 

Since differential dust extinction is not likely in HE0435 
( |Wisotzki et al.||2003[ ), we hypothesise that the A-band 
anomaly is caused by stellar microlensing that happened to 
be stronger at the time of the infrared observations than dur- 
ing the earlier optical monitoring by |Kochanek et aT| (f2006). 
Under this hypothesis, the (nearly) contemporaneous L flux 
ratios were not anomalous because the L' emission originates 
from a larger region that is less susceptible to microlensing. 
We test whether this hypothesis is reasonable by simulating 
microlensing near image B using the ray-shooting code of 
|Wambsganss| ( |1999[ |. 

We simulate microlensing in a box with side length 
L = 15i?Ein, where 7?Em is the Einstein radius for the mean 
star mass. This box is chosen to be large compared with the 
source, but small enough that we can apply a uniform con- 
vergence and shear across the box. We set the convergence 
and shear to the values predicted by our best-fit two-clump 
(AB) lens model: Ki oca i = 0.694 and 7i oca i = 0.486. To divide 
ftiocai into t ne contribution from stars («;*) and a contribu- 
tion that is smooth on the scale of the box (i.e., from dark 
matter), we use results from star+dark matter lens models 



for HE0435 by |Kochanek et al.| ( |2006[ ). Those models yield 
ft* = 0.05 log 10 (r c /i/kpc), where r c is the NFW scale radius 
in the models. For the range of values, 2.5" < r c < 20", con- 



sidered by Kochanek et al. (20061, the ft* values lie between 



0.05 and 0.10. We try both extremes. 

For each value of ft* we generate 100 random realisa- 
tions of the stellar distribution, drawn from a mass function 
dN/dm oc m -1,3 over the range 0.01-1.5 Mq. Such a mass 
function agrees with measurements from the Galactic bulge 



I Gould 2000 \ and has been used in previous microlensing 



studies ( |Morgan et al.|2010[ |Poindexter fc Kochanek|2010 1 
With this mass function and the source/lens redshifts appro- 
priate for HE0435, the Einstein radius for the mean stellar 
mass is log 10 (AEm/") = —6.1. 

For each realisation we use ray shooting to construct 
a magnification map with resolution L/1024. We then con- 
volve the map with a uniform circular source whose size cor- 
responds to the expected size of the A-band emission region. 
To estimate the source size, we start with the empirical re- 



sults for /-band sources from Morgan et al. (2010 1, and then 



scale from I (rest-frame 0.26 Atm) to A using the fa miliar 
relation R oc A 4 / 3 from 

+0.5 
0.7) 



yields a A-band source size of log 10 (A ; 



Shakura & Sunyaev (19731. This 
77 



-6.1 
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Figure 11. Probability distributions for the B/C flux ratio un- 
der the influence of microlensing. The three curves correspond to 
different source sizes: median size (dashed/blue), 68% CL high 
value (solid/black), and 68% CL low (dot-dashed/red), using the 
size distribution from Morgan ct al. (2010). The vertical dotted 
line indicates the value needed to reproduce our i\-band obser- 
vations. We find that small source sizes < 0.48 or 0.73 (in units 
of the Einstein radius of the mean star mass) can reproduce the 
observations well for = 0.05 or 0.10, respectively. 



or Rsrc/Rmn = 0.2-3.1 (68% CL). Already we see that the 
if-band source has a size that should make it sensitive to 
microlensing. 

Figure [Tl] shows results from the microlensing simula- 
tions. In general, it is not hard for microlensing to increase 
the magnification of image B by a factor of 1.27. To go into 
more detail, we consider what range of source sizes can yield 
a microlensing boost > 1.27 more than 16% of the time (cor- 
responding to a two-sided 68% CL). This requires a source 
size of R SIC /R E in < 0.48 (0.73) for = 0.05 (0.10). Such 
sizes are well within the estimated range for the 7\-band 
source (-Rsrc/^Ein ~ 0.2-3.1), so it seems quite plausible 
that microlensing can create the additional magnification 
necessary to explain the observed Tf-band flux ratio. 

At the same time, we must consider whether microlens- 
ing would affect the L' flux enough to disturb the agreement 
between the model and data. In a "worst case" scenario we 
might imagine that all the L' emission originates from the 
accretion disk. In this case the L'-band source size would 
need to be -R S rc/^?Em > 1.0-1.3 to keep the predicted flux 
ratio consistent with the observed value. With the IShakural 
|fc Sunyaev| scaling this would imply an associated Tf-band 



source size of R STC /Rvin > 0.54-0.63, which is compatible 
with the range of values needed to reproduce the observed 
AT-band flux ratio. Since it is likely that some of the L' 
emission originates from the larger torus, which would be 
even less sensitive to microlensing, we infer that the ob- 
served L'-band flux ratio is not inconsistent with significant 
microlensing in the 7f-band. 

Given uncertainties in the source sizes, stellar density, 
stellar mass function, and measured flux ratios, we conclude 
that microlensing provides a plausible explanation for the 
observed B/C flux ratio in the K-band. Confirmation of this 
hypothesis will be possible with future A'-band observations 
to quantify variability in the flux ratios. 



7.2 Time delays 



After our modeling was complete, Courbin et al. (2010) pre- 



sented new data for HE0435 including a measurement of the 
velocity dispersion of the lens galaxy and new estimates of 
the time delays between the images derived from six years 
of photometric monitoring. The new time delays have values 
similar to those obtained by |Kochanek et al.| ( |2006| but er- 
rorbars that are a factor of ~2.6 larger. The primary origin 
of the increased uncertainties lies in the analysis methods 



used by the two teams. In particular, Courbin et al. (20101 



find a larger uncertainty in the arrival time of image A. 

There is no obvious way to post-process our modeling 
results to apply the new time delay constraints in a rigor- 
ous fashion. Nevertheless, we can compare the distribution 
of time delays predicted by our models with the new mea- 
surements. This test is statistically meaningful because we 
did not place any constraints on time delays when fitting 
the models. 

Figure [12] shows the comparison. We find that models 
without substructure are somewhat discrepant from mod- 
els that include a few clumps. Furthermore, the predictions 
from our few-clump models are in good agreement with the 
new time delays measurements. It is clear, though, that 
adding time delay constraints would further improve the 
models. As a check, we have also examined predicted time 
delays for our population models (not shown). These, too, 
are in good agreement with the new data, and may bene- 
fit even more from new constraints because populations of 
substructure tend to broaden time delay distributions (see 
Keeton & Moustakas 2009 1. The improvement would be par- 
ticularly relevant for models with high / su b values, since time 
delays scatter with o> oc \Zf BU b- We plan to include the new 
time delays, as well as velocity dispersion constraints, in fu- 
ture work. 



8 CONCLUSIONS 

We have conducted a fully Bayesian analysis of the lens HE 
0435—1223. Combining astrometry from HST with flux ra- 
tios from ground-based monitoring, we probe the mass dis- 
tribution down to milli-arcsecond scales, and assess various 
models for substructure using the Bayesian evidence. We 
also examine multi-wavelength properties of the lens. We 
summarise our conclusions as follows: 

• The observed flux ratio of images A and C cannot be 
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Figure 12. Cumulative probability distributions of the time delays predicted by three models: smooth (black), smooth + clump near 
A (blue), and smooth + clumps near A and B (red). The shaded grayscale indicates the newly- measured time delays from [Courbin| 
|et al.| j2010| . We find that models with substructure generally predict time delays that agree with the data, even though the observed 
values were not used as constraints, whereas models without substructure do less well. Future studies will benefit from adding time delay 
constraints. 



reproduced by macroscopic, smooth lens models. This fail- 
ure cannot be due to microlensing or intrinsic variability 
of the background quasar, since monitoring has quantified 
such variations. Instead, we find that adding a single clump 
near image A whose mass within the Einstein radius is 
log 10 (A/Em) = 7.65lg'g4 (in units of h^ 1 Mq) can account 
for the data. 

• We consider other sources of substructure by includ- 
ing additional clumps near other lensed images. Using the 
Bayesian evidence to compare the various possibilities, we 
find that a model with clumps near images A and B is most 
favoured. This model has an evidence that is 0.63 dex greater 
than our single clump model, and implies a mass for clump 
B of log 10 (M Bin ) = 6.55ti.5i. However, the modest increase 
in the evidence, coupled with evidence uncertainties, makes 
the case for clump B less decisive than the case for clump 
A. 

• We also examine a full ensemble of subhalos, using a 
mass function consistent with CDM predictions (dN/dM oc 
M -1 ' 9 over the range 1O 7 -1O 1O M0) and varying the abun- 
dance of substructure. By examining the Bayesian evidence, 
we infer the mass fraction of substructure to be / su b > 
0.00077 near the Einstein radius. Our measurement of /sub, 



unlike other lensing-based measurements, is fully consistent 
with that predicted by CDM simulations (/ su b ~ 0.002- 
0.003). 

• As part of our substructure analysis, we find that opti- 
mising the macromodel for each realisation of substructure 
cannot necessarily substitute for marginalising the macro- 
model. In the Bayesian framework, full marginalisation is 
important. 

• Near-infrared flux ratio measurements in the K and L 1 
passbands generally agree with those from optical monitor- 
ing. The lone exception is the 7\~-band value of the flux ratio 
B/C. We show that stellar microlensing provides a plausible 
explanation for the K-ha,nA flux of image B, if the A"-band 
source has size log 10 (i? src /") < —6.24. Estimates based on 



accretion disk measurements by Morgan et al. ( 2010 1 suggest 
that such a size is reasonable. Future if-band observations 
can test the microlensing hypothesis. 
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APPENDIX A: CONNECTING INDIVIDUAL 
CLUMPS TO THE POPULATION 

In Section [6] we specify the form of the mass function and 
spatial distribution from which our substructure population 
is drawn. The free parameter in our analysis is the amount 
of substructure, characterised by « s . Here we present a toy 
model to extrapolate from our few-clump models (Section 
Bl and estimate k s . 

For clumps near an image, let A(m) be the area of the 
"region of influence" for a clump with scaled mass m to pro- 
duce a perturbation. Focusing on magnification perturba- 
tions, we have A(m) oc m since magnification perturbations 
are driven by shear perturbations of the form #7 oc m/d 2 
where d is the distance of the clump from the image (see 



5) 



Section 5.2 I. We can set the proportionality constant using 
our few-clump models: if there is a clump with scaled mass 
mi at distance di from image i, then we put 

Ai 



A(m) — — m 



(Al) 



where for simplicity we let Ai be the geometric area nd 2 . 

Under these assumptions, the mean number of clumps 
that lie within the region of influence for image i is 



Ai = / A(m) 



dn 
dm 



dm — 



(A2) 



where is the substructure convergence near image i (qv. 
eqn.[T|. In our main analysis we assume a uniform substruc- 
ture convergence, so K 3 ,i = k s for all images, but this frame- 
work could be made more general. The probability that there 
are Ni clumps affecting the image is a Poisson distribution 
of the form 



P(Ni\Xi) = 



(A3) 



Inspired by our few-clump models, we consider the proba- 
bility that there is one clump affecting image A: 

P(l\\ A ) = \Ae- XA . (A4) 

Since Xa depends on k s , we can reinterpret this as a prob- 
ability distribution for k 3 . Specifically, in a Bayesian frame- 
work with flat priors, the posterior distribution for k s has 
the form 



Polump(«s|w,A, Aa) oc — — k s exp 
rriA 



A A 

/ 

m A 



(A5) 



where mA is the mass of the clump near image A, and Aa = 
ivd A is the area defined by the distance aU of the clump from 
the image. This distribution has a peak at k s — mA/AA, 
a mean of (k s ) = 2mA/ Aa, and a standard deviation of 
<j Kb = \f2mAl Aa- In practice we account for uncertainties 
in mA and Aa by integrating over the allowed range, 



clump 



(«.)= E 



clump 



(K s \m A , A A ) P(m A , A A ) (A6) 



(m A ,A A ) 



where P(m A , Aa) is the posterior probability for the param- 
eter pair (mA, Aa), and the sum runs over allowed pairs. Us- 
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